7 research outputs found

    Web pages: What can you see in a single fixation?

    Get PDF
    Research in human vision suggests that in a single fixation, humans can extract a significant amount of information from a natural scene, e.g. the semantic category, spatial layout, and object identities. This ability is useful, for example, for quickly determining location, navigating around obstacles, detecting threats, and guiding eye movements to gather more information. In this paper, we ask a new question: What can we see at a glance at a web page – an artificial yet complex “real world” stimulus? Is it possible to notice the type of website, or where the relevant elements are, with only a glimpse? We find that observers, fixating at the center of a web page shown for only 120 milliseconds, are well above chance at classifying the page into one of ten categories. Furthermore, this ability is supported in part by text that they can read at a glance. Users can also understand the spatial layout well enough to reliably localize the menu bar and to detect ads, even though the latter are often camouflaged among other graphical elements. We discuss the parallels between web page gist and scene gist, and the implications of our findings for both vision science and human-computer interaction.Google Awar

    Bridging text spotting and SLAM with junction features

    Get PDF
    Navigating in a previously unknown environment and recognizing naturally occurring text in a scene are two important autonomous capabilities that are typically treated as distinct. However, these two tasks are potentially complementary, (i) scene and pose priors can benefit text spotting, and (ii) the ability to identify and associate text features can benefit navigation accuracy through loop closures. Previous approaches to autonomous text spotting typically require significant training data and are too slow for real-time implementation. In this work, we propose a novel high-level feature descriptor, the “junction”, which is particularly well-suited to text representation and is also fast to compute. We show that we are able to improve SLAM through text spotting on datasets collected with a Google Tango, illustrating how location priors enable improved loop closure with text features.Andrea Bocelli FoundationEast Japan Railway CompanyUnited States. Office of Naval Research (N00014-10-1-0936, N00014-11-1-0688, N00014-13-1-0588)National Science Foundation (U.S.) (IIS-1318392

    Search performance is better predicted by tileability than presence of a unique basic feature

    No full text
    Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a “basic feature” not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search.National Eye Institute (R01-EY021473

    Pooling of continuous features provides a unifying account of crowding

    No full text
    Visual crowding refers to phenomena in which the perception of a peripheral target is strongly affected by nearby flankers. Observers often report seeing the stimuli as “jumbled up,” or otherwise confuse the target with the flankers. Theories of visual crowding contend over which aspect of the stimulus gets confused in peripheral vision. Attempts to test these theories have led to seemingly conflicting results, with some experiments suggesting that the mechanism underlying crowding operates on unbound features like color or orientation (Parkes, Lund, Angelucci, Solomon, & Morgan, 2001), while others suggest it “jumbles up” more complex features, or even objects like letters (Korte, 1923). Many of these theories operate on discrete features of the display items, such as the orientation of each line or the identity of each item. By contrast, here we examine the predictions of the Texture Tiling Model, which operates on continuous feature measurements (Balas, Nakano, & Rosenholtz, 2009). We show that the main effects of three studies from the crowding literature are consistent with the predictions of Texture Tiling Model. This suggests that many of the stimulus-specific curiosities surrounding crowding are the inherent result of the informativeness of a rich set of image statistics for the particular tasks.National Institutes of Health (U.S.) (NIH-NEI EY021473)National Science Foundation (U.S.) (NSF Graduate Research Fellowship

    A general account of peripheral encoding also predicts scene perception performance

    No full text
    People are good at rapidly extracting the "gist" of a scene at a glance, meaning with a single fixation. It is generally presumed that this performance cannot be mediated by the same encoding that underlies tasks such as visual search, for which researchers have suggested that selective attention may be necessary to bind features from multiple preattentively computed feature maps. This has led to the suggestion that scenes might be special, perhaps utilizing an unlimited capacity channel, perhaps due to brain regions dedicated to this processing. Here we test whether a single encoding might instead underlie all of these tasks. In our study, participants performed various navigation-relevant scene perception tasks while fixating photographs of outdoor scenes. Participants answered questions about scene category, spatial layout, geographic location, or the presence of objects. We then asked whether an encoding model previously shown to predict performance in crowded object recognition and visual search might also underlie the performance on those tasks. We show that this model does a reasonably good job of predicting performance on these scene tasks, suggesting that scene tasks may not be so special; they may rely on the same underlying encoding as search and crowded object recognition. We also demonstrate that a number of alternative "models" of the information available in the periphery also do a reasonable job of predicting performance at the scene tasks, suggesting that scene tasks alone may not be ideal for distinguishing between models. Keywords: scene perception; peripheral vision; crowding; parafoveal vision; navigationNational Science Foundation (U.S.) (Award IIS-1607486

    Effects of temporal and spatiotemporal cues on detection of dynamic road hazards

    No full text
    While driving, dangerous situations can occur quickly, and giving drivers extra time to respond may make the road safer for everyone. Extensive research on attentional cueing in cognitive psychology has shown that targets are detected faster when preceded by a spatially valid cue, and slower when preceded by an invalid cue. However, it is unknown how these standard laboratory-based cueing effects may translate to dynamic, real-world situations like driving, where potential targets (i.e., hazardous events) are inherently more complex and variable. Observers in our study were required to correctly localize hazards in dynamic road scenes across three cue conditions (temporal, spatiotemporal valid and spatiotemporal invalid), and a no-cue baseline. All cues were presented at the first moment the hazardous situation began. Both types of valid cues reduced reaction time (by 58 and 60 ms, respectively, with no significant difference between them, a larger effect than in many classic studies). In addition, observers’ ability to accurately localize hazards dropped 11% in the spatiotemporal invalid condition, a result with dangerous implications on the road. This work demonstrates that, in spite of this added complexity, classic cueing effects persist—and may even be enhanced—for the detection of real-world hazards, and that valid cues have the potential to benefit drivers on the road

    Detection of brake lights while distracted: Separating peripheral vision from cognitive load

    No full text
    Drivers rarely focus exclusively on driving, even with the best of intentions. They are distracted by passengers, navigation systems, smartphones, and driver assistance systems. Driving itself requires performing simultaneous tasks, including lane keeping, looking for signs, and avoiding pedestrians. The dangers of multitasking while driving, and efforts to combat it, often focus on the distraction itself, rather than on how a distracting task can change what the driver can perceive. Critically, some distracting tasks require the driver to look away from the road, which forces the driver to use peripheral vision to detect driving-relevant events. As a consequence, both looking away and being distracted may degrade driving performance. To assess the relative contributions of these factors, we conducted a laboratory experiment in which we separately varied cognitive load and point of gaze. Subjects performed a visual 0-back or 1-back task at one of four fixation locations superimposed on a real-world driving video, while simultaneously monitoring for brake lights in their lane of travel. Subjects were able to detect brake lights in all conditions, but once the eccentricity of the brake lights increased, they responded more slowly and missed more braking events. However, our cognitive load manipulation had minimal effects on detection performance, reaction times, or miss rates for brake lights. These results suggest that, for tasks that require the driver to look off-road, the decrements observed may be due to the need to use peripheral vision to monitor the road, rather than due to the distraction itself
    corecore